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[57] ABSTRACT 

An apparatxis for providing described television services 
includes a receiver for receiving description data corre- 
sponding to an audiovisual program; a text-to-speech con- 
verter for converting the description data into a speech 
signal corresponding to the description data; a memory 
device for receiving and storing the speech signal and a 
corresponding time code from the audiovisual program; a 
mixing circuit for retrieving the speech signal from the 
memory device and mixing the retrieved speech signal with 
the audio track of the audiovisual program to produce a 
combined audio signal; and a transmitter for simultaneously 
providing the combined speech signal and the audiovisual 
program to a viewer. The apparatus provides the combined 
speech signal to the viewer via the SAP channel. The 
apparatus may also include a translator for translating the 
description data into a foreign language prior to converting 
the description data into the speech signal. 

A method for providing described television services 
includes the steps of generating description data correspond- 
ing to an audiovisual program; converting the description 
data to a speech signal corresponding to the description data; 
synchronizing the speech signal with the audiovisual pro- 
gram using a time code signal from the audiovisual program; 
mixing the synchronized speech signal with the audio track 
of the audiovisual program to create a combined audio 
signal; and simultaneously transmitting the combined audio 
signal and the audiovisual program to the viewer. 

12 Claims, 6 Drawing Sheets 
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SYSTEM AND METHOD FOR PROVIDING program has a SAP track (e.g., a Spanish SAP), the original 

DESCRIBED TELEVISION SERVICES SAP track may be used on some occasions and the descrip- 

tion SAP track on other occasions. 
RELATED APPLICATIONS There are also audio tape decks that may be slaved to 

•mis application is a continuation-in-part of U.S. patent ' ''''^^ ''^^^ '° provide the description SAP track for the 
application Ser. No. 08/398.165, filed Mar. 2, 1995 of the P^g"-"' ^^."^ '^^^ P^^^" an advanUtge 

. ^ in that It eliminates the need to re -master the audio track of 

same inventor. , . ■ , t ^ . . 

the onginal program as described above. 

FIELD OF TOE INVENTION Thus, since the known method of providing described 

television services is cumbersome, there is a need for a 

The present invention relates to an apparams and method method of providing described television services by which 

for providing described television services by which a descriptions are easily produced and transmitted to 

viewer is provided with an audio description of non-spoken viewers 
aspects of a television program, for example, the program's 

background scenery or non-verbal actions of characters. is SUMMARY OF THE INVENTION 

BACKGROUND OF THE INVENTION '° above discussion, it is an object of the 

present invention to provide an innovative apparatus and 

Television programs, plays, and other audiovisual types of method for providing described television services, 
presentations often include both an audio component and a According to a first embodiment of the present invention, 

visual component, each of which conveys inforaaation to a described television services for a program are provided by 

viewer. Closed captions may be provided for audiovisual encoding the descriptions as text characters into the vertical 

programs and presentations in order to allow people with blanking interval of the video signal of the program, for 

impaired hearing to follow the audio component of the example, in line 21 of the standard NTSC television signal, 

program or presentation. Similarly, an audio description of j^^ jata signal on line 21 consists of independent data on 

the presentation may be provided to enable people with fi^ij ^ g^ld 2. Each data channel may contain specific 

impaired vision to follow the visual component of the typ^g of jata packets as shown in the following table, 
program or presentation. The provision of such an audio 
description for television programs is referred to as 

described television services. Field l Packets Field 2 Packets 



Currently, described television services are not widely (Fl,Cl).Primary Synchronous CC3 (F2,Cl).Secondary Synchronous 

available. Several television stations, such as WGBH Captions Captions 
(Boston, Mass.), provide some described television pro- CC2 (Fi,C2)-special Non- CC4 (F2,C2)-Secondary Non- 
grams using the second audio program (SAP) of these Synchronous Captions Synchronous Captions 
f , . . ^ , , 4U J ' 4' • f r Tl (First Text Service) T3 (Third Text Service) 
television programs to transmit the description information. 35 ^ Jg^^^^ ^^^^ ^^^^ ^^^^^^^ Text Service) 

Described television programs using SAP are currently EDS- Extended Data Service 

produced using a process as follows. First, an original tape ^— ^— ^— — — ^— 

of the final version of the program, including all dialogue n • o u * • e • r^r-A • 

, J «" * • • J f * f If .u ™ 1 ne Primary Synchronous Caption Service CCl is pn- 

and sound effects, is obtained from a network. If the copy , / i- t l\ • i . .t, . . u 

... oAn. if 1 c u • f»u mary language (e.g., English) captioning data that must be 

mcludes a SAP track, for example, a Spanish version of the 40 • -.ufu \i f p ui • iu 

, - . . 1 • 1 . • m sync with the sound of a program, preferably in sync with 

audio track of the program, this track is lost in the process c m. c j <? « u l- c 

\ F . „ J a specific frame. The Secondary Synchronous Caption Ser- 

of providing description mformation. Second, editors pre- . *^ . w . • j . u 1 n 

^ , f.u r *!. vice CC3 IS an alternate captioning data channel usually 

pare concise, typewritten descriptions of the scenes 01 the , - , , 

*^ -m.- J c • 1 1 J *u used for second language captions, 

program. Third, one or more professional speakers read the . , , , 

descriptions. Typically, a single speaker reads the descrip- 45 The Special Non-Synchronous channels CC2 and CC4 

tions in a soft voice, almost as if he/she was whispering the ^arry data that is intended to augment mformaUon earned m 

scene detaUs to the blind viewer. Fourth, the "final" audio P'^^'f^ and need not be in sync with the sound. Delays 

track from the original program is re-mastered to be mono- of several seconds within the program are to be expected and 

phonic (SAP can only carry a mono signal) and to include °ot affect the integnty of the data, 

the descriptions as high fidelity monophonic signals. This 50 Text Service data are generally not program related. The 

re-mastering process requires a broadcast quaUty audio data are generally displayed as soon as they are received and 

facility. Fifth, the original program is re-mastered to record are intended to be displayed in a manner which isolates them 

the SAP track along with die other signals (e.g., the pro- from the video program used to transmit the data. Once the 

gram's video and audio signals) which, according to present display window created by the decoder is filled, text data are 

television capabilities, may be compressed digital audio 55 scrolled upward through the display window, 

stereophonic signals. Finally, the new tape, which is a The Extended Data Service (EDS) is a third data service 

"generation" down from the original program tape, is on field 2 which is intended to supply program related and 

returned to the network. other information to the viewer. Types of information pro- 

Once the network has received the tape, it may either vided by EDS include current program, title, length of the 

broadcast the new tape or "slave" the original tape and the 60 program, type of program, time of program, time remaiiiing 

new tape together using two tape decks and coordinating the ^ program, and other types of program -related information, 

two signals using the standard (SMITE) time code of the l^is information may be used, for example, to help a viewer 

tapes. In the "slave'' process, the original tape is used to determine what program is on even during a commercial, 

provide the video and sound (e.g., stereo) of the program, Future program and weather alert information may also be 

and the new tape is used to provide the SAP track for the 65 displayed. 

program which is inserted into the signal at broadcast lime. Further description of the line 21 data services, recom- 

The slave process has an advantage in that, if the original mended formats of each service, and other detailed infor- 
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malion is provided in the Electronic Industries Association extracted from the encoded program signal; converting the 

publication of September, 1994 entitled "ElA-608 Recom- description data to a speech signal corresponding to the 

mended Practice for Line 21 Data Service." Moreover, the description data; and providing the speech signal to a viewer, 

present invention should not be considered limited to the The described television text may be compressed prior to 

NTSC standard television signal. One of ordinary skill in the 5 encoding into the audiovisual program signal and decom- 

art practicing the present invention may adapt the present pressed at the receiver. 

technology suitably for PAL, SECAM, high definition ,^ ^ ^^^^ embodiment of the present invenUon. descrip- 

SeTcom SoVSTori^ate^ ^ °' ^P"^^" 

o er compression vi eo onna as appropna e. stored in a file along with coriesponding time code infor- 

Description data as used in the present apphcation and jg ^. ^ jf,^ audiovisual program 

clauns may be defined as auxUiary data transmuted for the ^^j^^ description is provided. The description data 

purpose of describing the non-verbal portion of an audio- subsequently transmitted to a speech synthesizer which 

visual program. Description data typicaUy may comprise ^31^5 3 j corresponding to the description 

text data, but also may comprise compressed text or ^^^^ Alternatively, the description data may be spoken by a 

graphical, symbohc or numenc data. The description data ^^^^^^^ ^ j^^, ^^-^ 

may share channel CI usmg a special marker to mdicate ^. ^ ^^^^ ^^ synthesizer or digital audio tape is 

which data B caption data and which is description data. ^j^^ ^ ^^^^ ^j,^ ^^^j^ ^^^^ „f ^-^^^ 

to create a 

Smce descriptions, by defimUon, occur when the actors are combined audio signal including both the original audio 

not speaking, the caption data and the descnption are ^^^^^ ^-^^^ , 

complementary and many be transmitted on the same chan- combined audio signal is inserted, for example, into the SAP 

^^^}:J^^^^^<^^^Pl^^^^^^ channel and transmitted simultaneously with the normal 

CC3 or CC4 which are not currently in i^e for captioning; ^-^^ j soundtrack of the video program, 

or Extended Data Services (EDS) as defined by the Elec- t u u / a ^ a- 

. ^ ..^ Ao^„« «.-«„ a:ta\ ,f ^a:„» In this embodiment, the need for encoding and decoding 

tronic I ndustnes Association (EIA) II coding IS developed ,j .. ^ . ... ... 

^ „, ti, tu« CIA .«^™»«^or;««o ^fo^w r.tu^. the description data is eliminated. Also, mterference and/or 

consistent with the EIA recommendations, or any other line -><:,,., ... j 

f tu 1 u!„«i,' -^t^^,™! x-w /Ti A\ ,e bandwidth concerns are ehmmaled because the descripuon 

of the vertical blanking mterval. Text services (Tl-4) as ^ . . . ^ rr - -.i .u i- r.u 

A a u n A u» ,.^^A tul data IS not transmitted as part of Line 21 or other line or the 

denned by the EIA may also be used to carry the descnp- . in, j 

j.^^ y r vertical blanking interval of the video signal. Moreover, 

, ' . ^ , - . . . consumers can use existing SAP receivers (built-in to stereo 

In the first embodiment oi the present mvention, a * i ■ • * \ . • j u j » i • 

lu luv mai vuiu-wuiiuviii^^ pi i , television sets) to receive desCTibed television services. 

decoder, e.g., a set-top or built-m decoder, extracts and 30 , ' . . , r .,■ j •. j 1 • 
stores description text characters received as a component of . ^ corresponding method for providing described televi- 
the television program signal. When a complete utterance is ^lon services according to the present invention includes the 
received, a "speak" command similar to a "display" com- ^"=P* °^ generaUng description data correspondmg to an 
mand for captions is received. The "speak" command trig- audiovisual program; storing the description data mto a file 
gers the input of the stored description text into a text-to- 35 ^^ich also includes correspondmg time code signals from 
speech synthesizer which generates audible speech ^'d"" P'°ef^' converting the description data to a 
corresponding to the description text. The synthesized voice ^P^"^'' ^'g"''' corresponding to the description data using, 
may be provided to the viewer using a secondary speaker ^f^P'^' ^P,^"^*' syn'hesizer or a digital audio tape 
attached to the sel-top unit or using the built-in television ^^^order; mixmg the speech signal with a soundtrack of the 
speaker when the decoder unit is built-in to the television 40 ''."'^ P'^^ram to provide a combmed audio signal; and 
set. The synthesized voice may also be transmitted to a blind simultaneously transmittmg the video program and the corn- 
viewer using wire or wireless technology (e.g.. infrared or ^med audio signal to a viewer. The combined signal may be 
frequency modulated (FM)). The transmitted infonnation transmitted to the viewer via the SAP chamiel. 
may also be provided to the viewer via, for example, a The foregomg and other features, aspects, and advantages 
personal loudspeaker, headset, or "ear bud." The transmitted 45 of 'he present invention will become more apparent from the 
information may include either the descriptions only or both following detailed description when read in conjunction 
the descriptions and the audio track of the program. with the accompanying drawings. 

A corre^onding apparatus for providing described tele- BRIEF DESCRIPTION OF THE DRAWINGS 

vision services according to the present invention includes a . ., , , , ,. ~ ^ , ,. r 

. c • • J . • A ■ ,■ J . , FIG. 1 provides a block diagram of a first embodiment of 

computer for receiving and stormg descnption data corre- 50 ^ ... j ..... . 

sponding to an audiovisual program; an encoder for encod- ^PP^"^^ providmg described television services 

ing the description daU into a program signal corresponding accordmg to the present invention. 

to the audiovisual program and transmitting the encoded 2 Provides a block diagram of the text-lo speech 

program signal; a receiver for receiving the encoded pro- processor provided at the viewer's location, for example, as 

gram signal, extracting the description data from the 55 a set-top or buih-in unit of a television set. 

encoded program signal, and outputting the description data; FIG. 3 provides a diagram of a first method of providing 

a text-to-speech converter for converting the description described television services according to the present inven- 

data into a speech signal corresponding to the description l^^n. 

data; and a speaker for providing the speech signal to a FIG. 4 provides a block diagram of a second embodiment 

viewer. 60 of the apparatus for providing described television services 

A corresponding method for providing described televi- according to the present invention, 

sion services according to the present invention includes the FIG. 4A provides a block diagram of a third embodiment 

steps of generating description data corresponding to an of the apparatus for providing described television services 

audiovisual program; encoding the description data into a according to the present invention. 

program signal of the audiovisual program; transmitting the 65 FIG. 5 provides a diagram of a second method of pro- 
encoded program signal; receiving and decoding the viding described television services according to the present 
encoded program signal, whereby the description data is invention. 
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DETAILED DESCRIPTION 

With reference to FIG. 1, an apparatus 100 for providing 
described television services according to the present inven- 
tion includes a receiver 101, an apparatus which receives a 
television program to be described; a description preparation 
apparatus 102 such as a personal computer by which 
receives a text description of the program to be described 
entered by a stenotypist, caption editor or typist and gener- 
ates and stores description data; and an encoder 104 which 
inserts the description data into, for example, line 21 of the 
program's vertical blanking interval. An optional caption 
preparation apparatus 103 may be used which receives 
caption text entered by a stenotypist or typist and generates 
and stores caption data. The caption data is preferably 
entered and stored in the same computer as is the description 
data. The caption data is also inserted into line 21 of the 
vertical blanking interval of the program signal by encoder 
104. The encoder 104 then transmits the program signal, 
including the description data and caption data (optional) to 
a receiver which may take the form of, for example, a set- top 
unit or a built-in unit for a viewer television set. 

Description data may also be provided using Automated 
Live Encoding (ALE) wherein the network video is broad- 
cast live and description data (and caption data) are provided 
to the encoder from a remote location (where description 
data is prepared) via modem. According to this embodiment 
of the system according to the present invention, a perma- 
nent record of the description data (and caption data) would 
be stored at the location where description data is prepared, 
but not at the network or post production location. The same 
process is repealed each time a program is broadcast with 
description and/or caption data. A system for displaying and 
encoding data such as that described in U.S. Ser. No. 
08/215,567, filed Mar. 22, 1994, incorporated herein by 
reference, may be used in this embodiment of the present 
invention. 

The program receiver 101 may receive the program to be 
described, for example, via live transmission, via satellite, 
via cable, via fiber-optic cable, or from a pre-recorded tape. 
The descriptions are then prepared using a standard caption- 
ing system which may be proprietary or off-the-shelf. The 
hardware for such captioning systems may be, for example, 
an IBM®, Apple® Macintosh®, or Unix® personal com- 
puter However, any suitably equipped computer may be 
used. Software used to prepare the description data may be 
the same as is used to prepare caption data. Available 
captioning programs include "Captivator"™ by Cheetah 
Systems'^" of Fremont, Calif., as well as other programs 
available form BASYS Automation Systems™ of Yonkers, 
N.Y.; Closed Captioning Services™ of Grand Rapids, 
Mich.; and SoftTouch™ of Alexandria, Va. These compa- 
nies offer software for the creation of real-time captions, 
off-line captions, or both. 

Also, an automatic speech recognition system such as that 
described in U.S. patent application Ser. No. 08/398,585, 
filed Mar. 2, 1995 and entitled "Automatic Speech Recog- 
nition System and Method for Closed Caption Production 
and Presentation," incorporated herein by reference, may 
also be used to prepare the description data in the apparatus 
according to the present invention. 

The operation of the "bead end" of the apparatus accord- 
ing to the present invention at which the description data is 
generated and later transmitted to individual viewers will 
now be described in detail. Working from an audiovisual 
program to be described, a description editor (a person) 
prepares descriptions for the program. The editor enters the 
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descriptions into a computer equipped using a captioning 
software program such as any of those listed above. The 
descriptions are entered using a standard time code which 
enables coordination of the audio track of the program (e.g., 
5 dialogue and sound effects) and the description. As a result, 
the descriptions are provided at intervals during the program 
when dialogue and/or sound effects are absent or less 
prominent. 

For example, in generating a description for the movie 
10 "The Terminator "™, after the Terminator says "ril be 
back," the description editor may add the following descrip- 
tion: "The Terminator mms and walks out of the police 
station." Similarly, after loud noises are heard, the descrip- 
tion editor inserts the following description: "The Termina- 
ls tor rams his van through the front door of the police station 
and starts shooting." 

This example also illustrates the compatibility between 
caption data and description data, in that there is little or no 
overiap at the time of presentation. Caption data is provided 
2° during dialogue intervals, while description data is provided 
during non-dialogue intervals. As a result, according to one 
embodiment of the apparatus according to the present 
invention, caption data and description data are simulta- 
neously entered by the editor using the same software. In 
this way, the software program may advise the editor when 
transmission bottlenecks occur, thus allowing the editor to 
change the caption and/or description data to fit within the 
time constraints of the program. 

The result of the preparation step in which description 
data and caption data (optional) are prepared is a computer 
file including text, time codes, and command information 
that is used by the encoder 104 to create a videotape and/or 
live broadcast of the program. The descriptions are simply 
another form of text information that is inserted into the 
television program signal, for example, into line 21, channel 
CI, C2 or EDS. However, any line of the vertical blanking 
interval may be used. 

If both description and caption data are inserted into the 
^ same channel, for example, channel CI of line 21, a marker, 
e.g., a binary marker, must be included to identify each type 
of data such that description data is not displayed as caption 
data and vice versa. 

The encoder 104 may be located at a network or post 
45 production facility, such that the description data is provided 
to the network or post production facility via a modem or 
even via parcel post, for example, in the form of a computer 
diskette). The encoding of the description data into the video 
signal is then performed at a location remote from the place 
50 at which the description data is prepared. 

Encoders for use as encoder 104 in the apparatus accord- 
ing to the present invention are available from EEG™ of 
Farmingdale, N.Y, and from SoftTouch'^" of Alexandria, 
Va. Each channel of descriptions and captions (optional) is 
55 handled by a separate encoder. For example, to create a 
master encoded tape including description data and caption 
data, two encoders are arranged in series. 

The output of the encoding process performed by encoder 
104 may be provided to a video tape and/or output as a live 
60 television broadcast signal. In other words, the encoded 
signal may be recorded and/or transmitted normally. The 
videotape may be used to feed a subsequent television 
broadcast or as a master or submasler (a copy of the master 
tape which is a full generation down from the master tape 
65 and is used instead of the master tape in duplication to 
prevent overuse, misuse or damaging of the master tape) for 
duplication and home video distribution. Copies may be 
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distributed using videocassette, video disks, CD-ROM, and text-to-speech synthesizer 202 wherein an analog speech 

other available forms. As long as the format remains in an signal is generated. The speech signal is output to amplifier 

NTSC format and any compression technique used pre- 203, where the signal is amplified and output to loudspeaker 

serves caption data, the descriptions (and captions) will 107. 

remain intact. 5 A number of off-the-shelf text-lo-speech converters are 

The present invention is not limited to analog television available for use in the apparatus according to the present 
applications, and may also be applied in digital television invention. These include products by Berkeley Speech Tech- 
systems, for example, by intermixing the description data nologies of Berkeley, Calif, and Digital Equipment Corpo- 
with caption data transmitted in a digital format. ration of Maynard, Mass. Text-to-speech converters may be 

With reference to FIG. 1, a receiver used in the apparams simple integrated circuits that accept digital input characters 

according to the present invention includes a reception and output an analog signal that, when amplified, are rec- 

processor 105 which decodes the description data and cap- ognizable as speech. More sophisticated text-lo-speech syn- 

tion data (if present). If both description and caption data thesizers use software programs which drive a loudspeaker, 

have been inserted into line 21, the decoder uses markings for example, of the type used in currently available mulU- 

encoded with the data to delineate description data from media personal computers. The system may also include a 

caption data. The reception processor 105 provides the combination of these two types of synthesizers. According 

description data to a text-to-speech processor 106 and cap- to one embodiment of the apparatus according to the present 

tion data to a television picture generator 109. invention, a set-top decoder utUizes a built-in chip to syn- 

The description data from the reception processor 105 is ^^esize the analog speech output, 
converted from a text format to an analog speech format in ^° 'I^e transmitted mformation (the synthesized speech) may 

text-to-speech processor 106. The speech output is then include only descriptions, or also include the audio track of 

provided to the viewer through loud speaker 107. Other the program (stereo or mono) and/or a SAP track (e.g., in 

forms of transmitting the speech output to the viewer, such Spanish). Furthermore, a mixer (not shown) may be incor- 

as through a wired or wireless personal speaker, headset, or porated into the system to accept and mix the television 

ear bud, are also contemplated within the scope of the program audio track (stereo, mono or SAP) as one input and 

present invention the descriptions as a second input, thereby transmitting the 

The reception processor 105 provides the program audio i°P^ts as a single audio track. Tlie output may be 

signal to the television sound system 108 which transmits P^^ided m monophomc or stereophomc sound, 
the audio portion of the program to the viewer using, for In addition to support for prerecorded television programs 

example, loudspeaker 110. The reception processor 105 also as described above, the apparatus and method accordmg to 

provides the video signal of the program, including any the present invention may be used for hve performances, 

caption data, to the television picture generator 109 which speeches, classrooms, and other types of presentations, 

displays the video signal on picture display 111. Further, the apparatus and method according to the present 

TTie reception processor 105 may simply pass through the 35 invention may also be used for teleconferences, distance 

received integrated signal to the television which provides learning programs, and other televised programramg m 

integrated audio, video, and caption display. The description ^^^'^'^^ to movies and television senes. 
data is extracted and provided to the text-to-speech con- The input to the system may be a real-time stenographer 

verter for processing. Caption data may also be provided to trained to key in the description text which enables descrip- 
the texl-lo-speech converter if desired, 40 tions to be delivered with live programs such as news and 

'llie viewer may obtain described television services oGf sporting events, 
the air, via cable, or via video. The reception processor 105 'llie apparatus and method according to the present inven- 

extracts description characters from the received television tion may also support multiple languages by including 

program signal and stores these characters until a complete additional language descriptions which are also encoded in 
utterance has been received. A complete utterance is iden- 45 the program signal. For example, a Spanish-speaking person 

tified by receipt of an output code or "speak** command may receive both the Spanish SAP and Spanish descriptions 

which tells the reception processor 105 to output the com- simultaneously. In one embodiment, an automatic transla- 

plete utterance to the text-to-speech processor 106. The tion system may be used to translate the English text into 

text-to-speech processor 106 converts the description text text in a foreign language which is then "spoken" using the 
into an analog format (i.e., speech) which is provided to the 50 text-to-speech synthesizer. 

viewer via loudspeaker 107 or any other appropriate speaker According to one embodiment of the apparatus according 

means, e.g., a wired or wireless personal speaker, headset or to the present invention (not shown), an automatic transla- 

ear bud (not shown). tion device is inserted between the reception processor 105 

The loudspeaker 107 which provides the synthesized and the text-to-speech processor 106 whereby the English 
voice generated by text-to-speech processor 106 to the 55 description text is translated into a desired foreign language 

viewer may be, for example, a secondary speaker associated such as Spanish prior to the text-to-speech conversion 

with a set-top unit, or the built-in television speakers when process. 

the reception processor is built into the viewer's television Another embodiment (not shown) of the apparatus 

set. Also, multiple lext-to-speech synthesizers may be used according to the present invention includes a data compres- 
to include a range of different voices. 60 sion device by which the described television text may be 

With reference to FIG. 2, the text-to-speech processor 106 compressed prior to encoding into the audiovisual program 

includes a television signal processor 201, a text-to-speech signal by encoder 104 and decompressed by reception 

synthesizer (a digital-to-analog converter) 202, an amplifier processor 105. Digital audio or text compression may be 

203, and loudspeaker 107. The signal processor 201 extracts utilized to conserve bandwidth for both the description data 
description data, for example, from Uoe 21, and stores the 65 and caption data. Compression and decompression may be 

data until an output code is received. When an output code accomplished, for example, using any known compression/ 

is received, the signal processor 201 sends the stored data to decompression algorithm. 
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With reference to FIG. 3, a method for providing 
described television services includes the steps of (301) 
generating description data corresponding to an audiovisual 
program; (302) encoding the description data into a program 
signal of the audiovisual program; (303) transmitting the 
encoded program signal; (304) receiving and decoding the 
encoded program signal, whereby the description data is 
extracted from the encoded program signal and stored in 
storage 320 until an output code is received, such that, in 
response to the output code, the description data is output to 
a texl-to-speech converter; (305) converting the description 
data to a speech signal corresponding to the description data; 
and (306) providing the speech signal to a viewer. The 
method may also include the steps of (310) generating 
caption data corresponding to the audiovisual program; 

(311) encoding the caption data into the program signal; 

(312) extracting the caption data from the received encoded 
program signal; (313) generating captions from the caption 
data; and (314) displaying the caption data to the viewer 

ITie method of providing described television services 
according to the present invention may also include the steps 
(not shown) of compressing the description data and caption 
data (optional) prior to encoding the description data and 
caption data into the program signal and decompressing the 
description data and caption data prior to generating a 
speech signal from the description data and captions from 
the caption data. The data compression may be performed 
using any of the many known compression/decompression 
algorithms. 

A second embodiment of an apparatus for providing 
described television services for an audiovisual program is 
illustrated in FIG. 4. This apparatus includes an input 
terminal 401 into which description data is input by one or 
more caption editors; a speech synthesizer 402 which con- 
verts the description data into a speech signal; a storage unit 
403 for storing the speech signal along with an accompa- 
nying time code signal provided, for example, from VTR 
404; and a mixer 405 which receives the speech signal and 
mixes it with the audio signal from the audiovisual program 
using the lime code signals from the program, ThG mixed 
audio signal including both the audio track of the audiovi- 
sual program and the description speech signal is transmitted 
by transmitter 406 to a viewer's television set 407, for 
example, via the SAP channel, simuhaneously with the 
video signal and the audio track of the audiovisual program. 
As suggested in connection with the above-described first 
embodiment of the present invention, the description data 
may be automatically translated into a selected foreign 
language via an automatic translator (not shown) known in 
the art prior to providing the description data to the text-to- 
speech synthesizer. 

The speech synthesizer 402 may be an off-the-shelf 
text-to-speech circuit or software program which converts 
text into an audio speech signal as described above with 
reference to FIGS. 1 and 2. The input terminal 401 may be 
a desktop computer having an attached keyboard 410 for 
entering the description data. A real-time stenographer keys 
in the description text via a second keyboard 411 such as a 
sle no-keyboard connected to terminal 401 which enables 
descriptions to be delivered with live programs such as news 
and sporting events. 

The storage unit 403 may be a hard drive attached to the 
desktop computer. The mixer circuit 405 may be a summing 
circuit which sums the audio signal from the soundtrack of 
the audiovisual program with the speech signal produced by 
the speech synthesizer 402. The transmitter 406 may be a 
radio frequency broadcast transmitter, cable television 
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transmitter, direct broadcast satellite transmitter or other 
suitable type of television transmitter known in the art. 

A third embodiment of the apparatus for providing 
described television services according to the present inven- 

s tion is shown in FIG. 4A. The apparatus in FIG. 4A includes 
the storage unit 403, VTR 404, mixer 405, transmitter 406, 
and viewer television set 407 as shown in FIG. 4. However, 
in the embodiment illustrated in FIG. 4A, the description 
data is generated by one or more human speakers who input 

10 (i.e., speak) the description data in the form of an analog 
signal into a recorder 420, for example an analog or digital 
audio tape (DAT) recorder. The recorder 420 creates a digital 
audio tape of the description data. Subsequently, the 
recorder 420 outputs the recorded digital speech signal 

15 which is synchronized to the master video tape of the audio 
visual program played by VTR 404 using time code signals 
by mixer 405. The mixer 405 then mixes the synchronized 
speech signal in real time to interleave the digital speech 
signal with the sum of the left and right stereo audio 

20 channels or with the mono audio signal of the audiovisual 
program. Transmitter 406 feeds the combined signal includ- 
ing the audio track of the audiovisual program and the digital 
speech signal directly in to the SAP channel which is 
transmitted to the viewer's television 407. 

According to this embodiment of the apparatus for pro- 
viding described television services according to the present 
invention, a foreign language (e.g., Spanish) SAP signal may 
be transmitted in addition to the speech signal. The foreign 
language SAP signal may be transmitted either without any 
accompanying speech signal (description data) or mixed 
with a corresponding speech signal in the foreign language. 

A method of providing described television services 
according to the second embodiment of the present inven- 

25 tion is illustrated in FIG. 5. This method includes the steps 
of (501) generating description data corresponding to an 
audiovisual program; (502) converting the description data 
to a speecb signal corresponding to the description data 
using, for example, a text-to-speech synthesizer or a 
recorder which records a human speaker; (503) synchroniz- 
ing the speech signal with the audiovisual program using a 
time code signal from the audiovisual program; (504) mix- 
ing the synchronized speech signal with the audio track of 
the audiovisual program to create a combined audio signal; 

^5 and (505) simultaneously transmitting the combined audio 
signal and the audiovisual program to the viewer by a 
suitable transmission apparatus as described above. The 
combined audio signal may be transmitted to the viewer, for 
example, over the SAP channel which is received by tele- 
vision sets having stereo capacity. Therefore, the customer 
does not need special equipment to receive the described 
television services. 

This method according to the present invention may also 
include a translation step to support multiple languages. For 

55 example, the English text may be translated into text in a 
foreign language by a translator (not shown), for example, a 
translating device or a human translator. The translated text 
is provided to the text-to-speech synthesizer 402 (FIG. 4) or 
recorder 420 (FIG. 4A). 

60 In addition to support for prerecorded television programs 
as described above, the apparatus and method according to 
the present invention may be used for live performances, 
speeches, classrooms, and other types of presentations. 
Further, the apparaUis and method according to the present 

65 invention may also be used for teleconferences, distance 
learning programs, and other televised programming in 
addition to movies and television series. 
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While the present inveation has been particularly 
described with reference to the preferred embodiments, it 
should be readily apparent to those of ordinary skill in the art 
that changes and modifications in form and details may be 
made without departing &om the spirit and scope of the 
invention. It is intended that the appended claims include 
such modifications. 

Qaimed is: 

1. An apparatus for providing described television 
services, comprising: 

description data receiving means for receiving description 
data corresponding to an audiovisual program; 

a translator for translating said description data into a 
foreign language; 

a text-to-speech converter for converting said description 
data into a speech signal corresponding to said descrip- 
tion data; 

storage means for receiving and storing said speech signal 
and a corresponding time code signal from the audio- 
visual program; and 

a mixing circuit for mixing said retrieved speech signal 
with the audio track of the audiovisual program accord- 
ing to said time code signal to produce a combined 
audio signal. 

2. An apparatus for providing described television 
services, comprising: 

a translator for translating description data into a foreign 
language; 

recording means for recording a speech signal corre- 
sponding to said description data for an audiovisual 
program; 

synchronizing means for synchronizing said speech signal 

with the audiovisual program using a time code signal 

from the audiovisual program; and 
a mixing circuit for mixing said synchronized speech 

signal with the audio track of the audiovisual program 

to produce a combined audio signal. 

3. A method for providing described television services, 
comprising the steps of: 

generating description data corresponding to an audiovi- 
sual program; 

translating said description data into a foreign language; 

converting said description data to a speech signal corre- 
sponding to said description data; 

synchronizing said speech signal with the audiovisual 
program using a time code signal from the audiovisual 
program; and 

mixing said synchronized speech signal with the audio 
track of the audiovisual program to create a combined 
audio signal. 

4. Apparatus for providing described television services 
comprising 
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means for generating a description data signal, said 
description data signal representing descriptive data 
comprising auxiliary data transmitted for describing a 
non-verbal portion of an audiovisual program, 
^ means for marking said description data signal, 

means for inserting said marked description data signal 

into a channel for closed captioning data, and 
means for transmitting said closed captioning data chan- 
10 nel. 

5. The apparatus of claim 4 wherein said closed caption- 
ing data channel is simultaneously applied for closed cap- 
tioning for the hearing-impaired. 

6. The apparatus of claim 5 wherein said marking step 
15 comprises marking said description data differently from 

closed captioning data. 

7. The apparatus of claim 4 wherein said closed caption- 
ing data channel is a separate channel from one applied for 
closed captioning data for an audio portion of an audio - 

20 visual program. 

8. The apparatus of claim 1 wherein said description data 
receiving means is responsive to one of a marker for 
marking said description data or a separate channel from a 
closed caption data channel representing an audio portion of 

25 said audiovisual program. 

9. The apparatus of claim 2 further comprising one of a 
marking means for marking said descriptive data or receiv- 
ing means responsive to a separate channel from a closed 
caption data channel representing an audio portion of said 

30 audiovisual program. 

10. The method of claim 3 further comprising the step of 
one of marking said description data or transmitting said 
description data on a separate channel from a closed caption 
data channel representing an audio portion of said audiovi- 

35 sual program. 

11. Apparatus for providing described television services, 
comprising: 

a translator for translating description data for an audio- 
visual program into a foreign language, 

recording means for recording a speech signal corre- 
sponding to said translated description data for said 
audiovisual program, 

synchronizing means for synchronizing said speech signal 
with an audio track signal of said audiovisual program 
using a time code signal from said audiovisual 
program, and 

a mixing circuit for mixing said synchronized speech 
signal with said audio track signal of said audiovisual 
50 program to produce a combined audio signal. 

12. The apparatus of claim 11 wherein said description 
data comprises auxiliary data transmitted for describing the 
non-verbal portion of an audiovisual program. 

* t « >f ic 
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